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Abstract. This article studies the problem of approximating functions belonging 
to a Hilbert space Hd with an isotropic or anisotropic Gaussian reproducing kernel, 

Kd{x, t) = exp ^- ^ -ij{xe, - t(f^ for all x,t R''. 

The isotropic case corresponds to using the same shape parameters for all coordi- 
nates, namely 7^ = 7 > for all whereas the anisotropic case corresponds to 
varying shape parameters 7£. We are especially interested in moderate to large d. 
We consider two classes of algorithms: 

(1) using finitely many arbitrary linear functionals, 

(2) using only finitely many function values. 

The pertinent error criterion is the worst case of such an algorithm over the unit 
ball in Hd^ with the error for a single function given by the C2 norm also with a 
Gaussian weight. 

Since the Gaussian kernel is analytic, the minimal worst case errors of algorithms 
that use at most n linear functionals or n function values vanish like 0{n^P) as 
n goes to infinity. Here, p can be arbitrarily large, but the leading coefficient may 
depend on d (Theorem 1). On the other hand, if d dependence is taken into account, 
the convergence rate may be quite slow. If the goal is to make the error smaller 
than Cn~P for some C independent of d or polynomially dependent on d, then this is 
possible for any choice of shape parameters with the largest p equal to 1/2, provided 
that arbitrary linear functional data is available (Theorem 2). If the sequence of 
shape parameters 7^ decays to zero like ^"'^ as £ (and therefore also d) tends to 00, 
then the largest p is roughly max(l/2,a;) (Theorem 3). If only function values are 
available, dimension-independent convergence rates are somewhat worse (Theorems 
4 and 5). 

If the goal is to make the error smaller than Cn^P times the initial (n — 0) 
error, then the corresponding p is roughly uj. Therefore it is the same as before 
iff > 1/2 (Theorem 7 and Corollary 2). In particular, for the isotropic case, 
when a; = 0, the error does not even decay polynomially with n^^ (Theorem 6). 
In summary, excellent dimension independent error decay rates are only possible 
when the sequence of shape parameters decays rapidly. 
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1. Introduction 

Algorithms for function approximation based on symmetric, positive definite ker- 
nels are important and fundamental tools for numerical computation [2, 5, 13, 23], 

statistical learning [1, 4, 8, 12, 14, 16, 17, 20], and are often used in engineering appli- 
cations [6]. These algorithms go by a variety of names, including radial basis function 
methods [2], scattered data approximation [23], meshfree methods [5], (smoothing) 
splines [20], kriging [16], Gaussian process models [12] and support vector machines 
[17]. 

In a typical application we are given noisy or noiseless scalar or vector data. For 
simplicity, this article treats only the noiseless scalar case in which the data is of the 
form Ui = f\xi) or Ui = Li{f) for i = 1, . . . ,n. That is, a function / is sampled at the 
locations {xi, . . . , Xn}, usually referred to as the data sites or the design, or more 
generally we know the values of n linear functionals Li on f. Here we assume that the 
domain of / is a subset of R'^. One then chooses a symmetric, positive definite kernel 
Kd (see (3) below for the specific requirements), ideally such that / G H{Kd), where 
H{Kd) is a reproducing kernel Hilbert space with the reproducing kernel Kd- Then 
it is a good idea to construct an approximation 5'„(/) to / which has the minimal 
norm among all elements in H{Kd) that interpolate the data. This corresponds to 
the spline algorithm and requires the solution of an n x n system of linear equations. 
While the spline algorithm is optimal in the sense explained in Section 2 below, there 
still remains the important questions of how fast Sn{f) converges to / as the number 
of data n tends to infinity, and how to choose the data sites or linear functionals to 
maximize the rate of convergence to /. Another question is to study how the error 
bounds depend on d. The latter question is especially important when d is large. 

The typical convergence rates (see, e.g., [5, 23]) are of the form 0{n~^/'^), where 
p denotes the smoothness of the kernel Kd-, and the design is chosen optimally. Un- 
fortunately, for a finite p, this means that as the dimension increases, these known 
convergence rates deteriorate dramatically. Furthermore, the dimension dependence 
of the leading constant in the big O-term is usually not known in these estimates. 

This article studies Hilbert spaces with reproducing kernels i^^ : R"^ x M'^ ^ R. 
The kernel is called translation invariant or stationary if K{x,t) = K{x — i). In 
particular, the kernel is radially symmetric or isotropic if K{x,t) = k{\\x — t|p), in 
which case the kernel is called a radial (basic) function. 

A kernel commonly used in practice, and one which is studied here, is the isotropic 
Gaussian kernel: 

(la) Kd(a;,t) = 6-^'"="-*"' for all x,teR'^, 

where a positive 7 is called the shape parameter. This parameter functions as an 
inverse length scale. Choosing 7 very small has a beneficial effect on the rate of 
decay of the eigenvalues of the Gaussian kernel, as is shown below. An anisotropic, 
but stationary generalization of the Gaussian kernel is obtained by introducing a 
different positive shape parameter 7^ for each variable. 



(lb) 



Kd{x, t) = e-^?(^i-*i)' il{^d-ta? for x, t e R'^. 
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Table 1 . Error decay rates as a function of sample size n 





Error Criterion 


Data Available 


Absolute 


Normalized 


Arbitrary 


^ ^-max(r(7),l/2) 




Linear functionals 


Theorem 3 


if r(7) > 0, Theorem 7 


Function values 


^ ^-max(r(7)/[l+l/(2r(7))],l/4) 


^ ^-r(7)/[l+l/(2r(7))] 




Theorem 4 and 5 


if r(7) > 1/2, Corollary 2 



As evidence of its popularity, we note that this latter kernel is used in the Gaussian 
process modeling module of the JMP commercial statistical software [9]. In JMP, the 
values of the 7^ are determined in a data-driven way^. 

We stress that the Gaussian kernels arc analytic, and the smoothness parameter 
p = 00. Therefore one can hope to obtain convergence rates of the form 0{n~'^) 
for an arbitrarily large r. As we shall see, this is indeed the case. This is shown in 
Theorem 1 and explained in Section 4. However, the dependence on d is a function 
of T and only for a relatively small r is the dependence on d acceptable. 

Given the growing number of applications with moderate to large dimension, d, 
it is desirable to have dimension-independent polynomial convergence rates of the 
form Cn~P for positive C and p, which corresponds to strong polynomial tractability, 
or at worst, convergence rates that are polynomially dependent on dimension d and 
are of the form Cd'^n'^ for positive C,q and p, which corresponds to polynomial 
tractability. 

This paper establishes convergence rates with polynomial or no dimension depen- 
dence for the Gaussian kernel introduced in (1). The rates are summarized in Table 1. 
As explained in Section 2, the absolute error is the £2 worst case approximation error 

based on a Gaussian weight with mean zero and variance 1/2. The normalized error 
is the absolute error divided by \\Id\\, were I^i denotes the embedding between the 
radial function space and the £2 space. Note that the norm is the initial 
error that can be achieved by the zero algorithm without sampling the functions. 
The dimension independent convergence rates depend to some extent on which error 
criterion is used. They also depend on whether the data available consists only of 
function values or, more generally, of arbitrary linear functionals. This latter, more 
generous setting may allow for faster convergence. 

The notation ^ n^^ in Table 1 means that for all 5 > the error is bounded above 
by Cn-P+^ for some constant C that is independent of the sample size, n, and the 
dimension, d. but it may depend on S. The notation >z is defined analogously, 
and means that the error is bounded below by Cn^^^^ for all (5 > 0. The notation 
X tT'p means that the error is both -< n~P and >: n'^. 

As can be seen in Table 1, the convergence rates depend strongly on how fast the 
sequence of shape parameters 7 = {7^}^eN goes to zero. The term r(7) appearing in 



In the tractability literature, the shape parameters 7^ are called product weights. 
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Table 2. Number of data, n{e,Hd), needed to obtain an error toler- 
ance e 





Error Criterion 


Data Available 


Absolute 


Normalized 


Arbitrary 


^— min(l/r(')'),2) 


X e-V'-(7) 


Linear functionals 


Theorem 3 


if r{-y) > 0, Theorem 7 


Function values 


^ ^-min(l/r(7)+l/[2r2(7)],4) 


^ ^-l/r(7)-l/[2r2{7)] 




Theorem 4 and 5 


if r(7) > 1/2, Corollary 2 



Table 1, is defined by 
(2) 



r(7) = sup <^ /3 > 



< OO 



with the convention that the supremum of the empty set is taken to be zero. 

For instance, for the isotropic case with 7^ = 7 > we have r{'y) = 0, whereas 
for 7^ = for a nonnegative a we have r{'y) = a. If the 7^ are ordered, that is, 
7i > 72 > • • • , then this definition is equivalent to 

r(7) = sup > I lim 7^^^^ = 0|. 

For excellent dimension independent convergence one needs the sequence of shape 
parameters to decay to zero quickly, as can be seen in Table 1. These results are 
derived in Sections 5 and 6. 

While writing the error as a function of the sample size is common in the numerical 
analysis literature, the computational complexity literature looks at the number of 
data required to obtain a given error tolerance. Let n{£, H^) denote the minimal 
number of function values or linear functionals that are needed to compute an e ■ CRI^ 
approximation. Here, CRI^ = 1 for the absolute error criterion, and CRI^ = for 
the normalized error criterion. Again, \\Id\\ is the initial error that can be achieved by 
the zero algorithm without sampling the functions. The tractability results presented 
in this paper are summarized in Table 2. 

For the absolute error and algorithms that use arbitrary linear functionals, we prove 
strong polynomial tractability for all choices of shape parameters 7^. Furthermore, the 
exponent 2 of is best possible for all 7^'s that go to zero no faster that For the 
absolute error and algorithms that use function values, we still have strong polynomial 
tractability with exponent at most 4. 

For the normalized error, the situation is much worse. If the sequence of shape pa- 
rameters tends to zero fast enough, we still have strong polynomial tractability. How- 
ever, for the isotropic case it follows that n(£, Hd) does not depend polynomially on 
and d. For algorithms using arbitrary linear functionals, we have quasi-polynomial 
tractability, i.e., there are positive C and t such that 

n(£, Hd) < C exp(t (l + In d) (1 + In s"^)) for all e e (0, 1) and den. 



RADIAL FUNCTION APPROXIMATION 5 

Furthermore, the smallest t is roughly^ 

As a prelude to deriving these convergence and tractability results, the next section 
reviews some principles of function approximation on Hilbert spaces. Section 3 applies 
these principles to the Gaussian kernel. 



2. Function Approximation 

Let Hd = H{Kd) denote a reproducing kernel Hilbert space of real functions defined 
on a Lebesgue measurable set C R'^. The goal is to accurately approximate any 
function in given a finite number of data about it. The reproducing kernel 

Ka-.DdXDd^R 

is symmetric, positive definite and reproduces function values. This means that for all 
n e N, a?, t, a?!, a;2, ■ ■ ■ , cCn £ Dd, c = (ci, C2, . . . , c„) e R" and / e Hd, the following 
properties hold: 

(3a) Kd(;x) e Hd, 

(3b) Kd{x,t) = Kd{t,x), 



(3c) ^Kd{xi,Xj)ciCj > 0, 



i=l 



(3d) fix) = {f,Kd{;x))^^. 

For an arbitrary x G Dd consider the linear functional Lx{f) = f{x) for all / G Hd- 

1 /2 

Then is continuous and ||La.||//* = {x,x). The reader may find these and 
other properties in e.g., [1, 20]. Many reproducing kernels are used in practice. A 
popular choice is the Gaussian kernel defined in (1) for which Dd = M'^. 

It is assumed that Hd is continuously embedded in the space £2 = i^2{Dd, Qd) of 
square Lebesgue integrable functions. Here, Qd is a probability density function, i.e., 
Qd>^ and Qdit) dt = 1. The norm in the space £2 is given by 



= P{t)Qd{t)dt^ 



1/2 



Continuous embedding means that the linear embedding operator Id : Hd ^ Li given 
by Idj = / is continuous, 

\\ld!\\c, < WhW \\!\\u, for all / G Hd. 



In this paper, by In we mean the natural logarithm of base e. 
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Observe that 

WWl = [ nt)Qa{t)dt=j {f,M;t))l^gd{t)dt 

< II/IIL / m,t)gd{t)dt. 
Hence, it is enough to assume that 

(4) / Kd{t,t)Qd{t)dt <oo 
to guarantee that la is continuous, and obviously 

\\h\\< / Kd{t,t)g{t)dt . 
UDa J 

Functions in are approximated by linear algorithms^ 

n 

(5) An{f) = J2Lj{f)aj for all f e 

for some continuous linear functionals Lj e H^, and functions Uj e £2- The worst 
case error of the algorithm is then defined as 

e--(^„)= sup \\f-AM\\c,. 

Il/lkd<i 

The linear algorithms An considered here are based on function data Lj{f), where 
the continuous linear functionals Lj may belong to one of two classes. The first class, 
denoted A*^**^, is comprised only of function values and is called standard. That is, 
Lj e A^**^ iff Lj{f) = f{tj) for all f e for some tj e Da- The second class, 
denoted A^", is comprised of arbitrary continuous functionals and is called linear. 
That is, Lj e A^^^ iff Lj e H*. Obviously, A"*'^ C A^^^. 

The aim is to determine how small the worst case error can be by choosing linear 
algorithms with only n linear functionals either from A^**^ or A^". The nth minimal 
worst case error is defined as 

e'^°'-'^{n,Hd) ^ inf e"'°''(A„), 7? e {std, all}. 

An with Lj-eA'' 

Here and below, for notational simplicity 'd denotes cither the standard or linear 
setting. Clearly, e*°''~'^^^(n, i^d) < e^"^~^^'^{n, Hd) since the former uses a larger class 
of function data. 



■^It is well known that adaption and nonlinear algorithms do not help for approximation of linear 
problems. A linear problem is defined as a linear operator and we approximate its values over a 
set that is convex and balanced. The typical example of such a set is the unit ball as taken in 
this paper. Then among all algorithms that use linear adaptive functionals, the worst case error is 
minimized by a linear algorithm that uses nonadaptive linear functionals. Adaptive choice of a linear 
functional means that the choice of Lj in (5) may depend on the already computed values Li{f) for 
i = 1, 2, . . . , J — 1. That is why in our case, the restriction to linear algorithms of the form (5) can 
be done without loss of generality, for more detail see, e.g., [19]. 
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The case n — means that no hnear functionals of / are used to construct the 
algorithm. It is easy to see that the best algorithm possible is Aq — 0, and then 

e^°^-\0,Ha)^\\h\\. 

The minimal error for n = is called the initial error and it only depends on the 
formulation of the problem. 

This article addresses two problems: convergence and tractability. The former con- 
siders how fast the error vanishes as n increases, and the latter considers how the 
error depends on the dimension, d, as well as the number of data, n. 

Problem 1: Rate of Convergence 

We would hke to know how fast e'^°'^~'^(n. Ha) goes to zero as n goes to infinity. 
In particular, we study the rate of convergence (defined by (2)) of the sequence 
{e^°'^~'^{n, Hd,)}nefi- Since the numbers e"°^~'^{n,Hd) are ordered, we have 

(6) r'^^'-^iRd) := r ({e""'-^{n, Ha)}) = sup (/3 > | lim e'^°'~'^{n, Ha) = o| . 

Roughly speaking, the rate of convergence is the largest f3 for which the nth min- 
imal errors behave no worse than n~^. For example, if e"°'^~^{n, Hd) = nT"" for a 
positive a then r'"°'^~'^{Hd) — a. Under this definition, even sequences of the form 
e^°^~'^{n,Hd) = n~°'\n^n for an arbitrary p still have r'^°^~'^[H(i) = a. On the other 
hand, if e^°'-'^{n, Ha) = q"" for a number q G (0, 1) then r^^'-^^^Ha) = oo. 

Obviously, r'^°''"*^^(i7d) > r"°''"''*'^(i7d). We would like to know both rates and 
whether 

i.e., whether A^^ admits a better rate of convergence than A^**^. 
Problem 2: Tractability 

Assume that there is a sequence of spaces {iJdldeN ^-nd embedding operators 
{-^djdeN- In this case, wc would like to know how the minimal errors e^°^~'^{n, H^) 
depend not only on n but also on d. 

More precisely, we consider the absolute and normalized error criteria. For a given 
(small) positive e e (0, 1) we want to find an algorithm An with the smallest n for 
which the error does not exceed e for the absolute error criterion, and does not exceed 
£ for the normalized error criterion. That is, 

^wor-v-^(g^ //d) = min |n | e'^°'~'^{n, Ha) < e CRiJ } , V e {abs, norm}, 

where CRif = 1 for the absolute error criterion and CRI^"'^ = for the normal- 
ized error criterion. 

Let I — {Id}den denote the sequence of function approximation problems. We say 
that I is polynomially tractable iff there exist numbers C, p and q such that 

^wor-v-^(^^ //d) < Cd'^e-P for all deN and e e (0, 1). 
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If g = above then we say that X is strongly polynomially tractable and the infi- 
mum of p satisfying the bound above is called the exponent of strong polynomial 
tractability. 

The essence of polynomial tractability is to guarantee that a polynomial number of 
linear functional is enough to satisfy the function approximation problem to within e. 
Obviously, polynomial tractability depends on which class, A*^^^ or A^*'^, is considered 
and whether the absolute or normalized error is used. As shall be shown, the results 
on polynomial tractability depend on the cases considered. 

The property of strong polynomial tractability is especially challenging since then 
the number of linear functionals needed for an £-approximation is independent of d. 
The reader may suspect that this property is too strong and cannot happen for 
function approximation. Nevertheless, there are positive results to report on strong 
polynomial tractability. 

Besides polynomial tractability, there are the somewhat less demanding concepts 
such as quasi-polynomial tractability and weak tractability. The problem I is quasi- 
polynomially tractable iff there exist numbers C and t for which 

^wor-v-^(^^ i/d) < C exp {t ln(l + d) ln(l + s-^)) 

for all (i G N and e > 0. The exponent of quasi-polynomial tractability is defined as 
the infimum of t satisfying the bound above. Finally, X is weakly tractable iff 

lim ; = 0. 

£-i+d-^oo e + d 

Note that for a fixed d, quasi-polynomial tractability means that 

ri^°'-'f'-^{e, Hd) = O as £ ^ 0. 

Hence, the exponent of may now weakly depend on d through In d. On the other 
hand, weak tractability only means that we do not have exponential dependence on 
and d. 

We will report about quasi-polynomial and weak tractability in the case when poly- 
nomial tractability does not hold. As before, quasi-polynomial and weak tractability 
depend on which class A*^^^ or A®*'^ is considered and on the error criterion. 

Motivation of tractability study and more on tractability concepts can be found 
in [11]. Quasi-polynomial tractability has been recently studied in [7]. 



We end this section by briefiy reviewing some general results related to the prob- 
lems of convergence and tractability mentioned above. For a given design, i.e., given 
continuous linear functionals Li, . . . , L„, it is known how to find functions 
for which the worst case error of An is minimized. The optimal algorithm, S'„, should 
be taken as the spline or the minimal norm interpolant, see e.g. Section 5.7 of [19]. 
The spline algorithm was briefly mentioned in the introduction. It is described in 
more generality here. 
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For given yj — Lj{f) for j — 1,2, ... ,n, we take as an element of if^ that 

satisfies the conditions 

Lj{S„{f)) = Vj for j^l,2,...,n, 

\\Snif)\\H, = inf II^IIj^,. 

gGHa, Lj{g)=yj, j=l,2,...,n 

The construction of Sn{f) may be done by solving a linear equation Kc = y, where 
y = (z/i) 1/2, • • • 5 Vn)^ and the n x n matrix is given by 

K = (^i,j)"j=i with ki,j^Li{gj) and gj{x) ^ LjKd{-,x). 

Then 

5„(/)(x) = fc^(a;)K-it/ with fc(a;) = (L,Xrf(.,x))^=i 

and 



e'^°'(5n) = sup 

II/I|h^<1, Lj{f)=0,j=l,2,...,n 

Note that depending on the choice of linear functionals Li, . . . , L„ the matrix K may 
not necessarily be invertible, however, the solution c = K~^y is always well defined 
as the vector of minimal Euclidean norm which satisfies Kc — y. 

The spline enjoys more optimality properties. For instance, it minimizes the local 
worst case error. Roughly speaking this means that for each x G D^, the worst pos- 
sible pointwise error \ f{x) — An{f){x)\ over the unit ball of functions / is minimized 
over all possible An by choosing An = Sn- We do not elaborate more on this point. 

It is non-trivial to find the linear functionals Lj from the class A^*^ that minimize 
the error of the spline algorithm Sn- For the class A'^^^ the optimal design is known, 
at least theoretically, see again e.g., [19]. Namely, let Wd = I^Id : — > H^, where 
: C2 ^ Hd denotes the adjoint of the imbedding operator, i.e., the operator 
satisfying (/, I^h) = {Idf, h) for all f & Hd and h e C2. As a consequence, Wd is 
a self adjoint and positive definite linear operator given by 

Wd{f) = [ fit) Kdi; t) Qd{t) dt for aU / e Hd. 

Clearly, 

(/, g)^^ = {Idf, Id9)c, = 9)h, = {L Wdg)H, for all /, g e Hd. 

It is known that hm„^oo e^°'^~"^"(n, Hd) = iff Wd is compact. In particular, (4) 
implies that Wd is compact. 

Assuming that Wd is compact, let us define its eigenpairs by {Xdj , Vdj) , where the 
eigenvalues are ordered, A^j > Ad,2 > • ■ ■ , and 

WdVdj = >^d,jVd,j with {Vd,j:Vd,i)Ha = for aU i,jeN. 
Note also that for any f e Hd we have 

(/' Vd,j)c^ = {Idf, IdVd,j)c^ = (/, Wdrjd,j)H^ = Xdj if, Vd,j)H^ ■ 
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Taking / = 77^ ^ we see that {rjdj} us a set of orthogonal functions in £2- For simplicity 
and without loss of generality we assume that all A,^^ are positive^. Letting 

Vd,j = ^dTvdj for all j eN 

we obtain an orthonormal sequence {(pd,j} in J^2- Since {r]d,j} is a complete orthonor- 
mal basis of we have 

00 00 

(7) Kd{x, t) = Y^ Vd,j{x) rid,j{t) = Xd,j (Pd,j{x) ipd,M all x,t e Dd- 
If (4) holds then 

00 

(8) y]^d,i= / Kd{t,t)Qd{t)dt<cx^. 

This means that (4) implies that Wd is also a finite trace operator. 

It is known that the best choice of Lj for the class A^" is Lj = (•, y]d,j) h^- Then the 
spline algorithm Sn with the minimal worst case error is defined using the eigenfunc- 
tions corresponding to the n largest eigenvalues: 

n 

Sn{f) = ^dd) Ha ^d,j for all / e Hd, 

J=l 

and 

e-°^iSn)^e-°^-^\n,Hd)^^/^~^^ for all neN. 



The last formula for n = yields that the initial error is \\Id\\ = \/Xd,i- 

The results for the class A'^^^ are useful for finding rates of convergence as well 
as necessary and sufficient conditions on polynomial, quasi-polynomial and weak 
tractabihty in terms of the behavior of the eigenvalues Xdj. This has already been 
done in a number of papers or books, and we will report these results later for spaces 
studied in this paper. For the class A'^*'^, the situation is much harder although there 
are papers that relate rates of convergence and tractabihty conditions between classes 
A*^'' and A'^''^. Again we report these results later. 

3. Radial Function Spaces 

The focus of this article is on reproducing kernels that are translation invariant or 
stationary, i.e., 

Kd{x, t) = Kd{x - t) for all x,teDd^ M*^. 
An even more special case is for radially symmetric or isotropic kernels, i.e., 

d 

Kd{x,t) = k{\\x -t\\l) with \\x-t\\l = Yi^e-'tef- 

1=1 

Here, Kd and k are chosen such that Kd is a reproducing kernel. 



^Otherwise, we should switch to a subspace of spanned by eigenfunctions corresponding to k 
positive eigenvalues, and replace N by {1,2,...,/;;}. 
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Isotropic kernels also go by the name radial basis functions, and the spaces H{K^ 
are referred to as radial function spaces. Stationary or isotropic kernels are common in 
the literature on computational mathematics [2, 5, 23], statistics [1, 16, 20], statistical 
learning [12, 17], and engineering applications [6]. 

A popular isotropic kernel is the Gaussian kernel, defined in (1), which has both 
an isotropic version, 

Kd{x, t) = 6-^'"="-*"' for all x,t E W^, 
and a more general anisotropic version, 

Kd{x, t) = e-'i'^^i-*!)' 7,^(^d-t.)2 ^ ^ ^ 

As alluded to in the introduction, the shape parameter, 7 or 7 = {je}een, which 
functions as an inverse length scale, plays an important role in the tractability of 
function approximation. Choosing the 7^ to decay quickly has a beneficial effect on 
the rate of decay of the eigenvalues of the Gaussian kernel, as we shall see below. On 
the other hand, a small value of 7 leads to a huge condition number of the matrix 
K and may result in severe numerical instabilities. While this is an issue that is very 
important for practical implementations, and has received some attention, we will 
not discuss it any further here. 

We now analyze the function approximation problem for the Hilbert space Hd = 
H{Kd) with the isotropic Gaussian kernel Kd given by (la) or, more generally, with 
the anisotropic Gaussian kernel given by (lb). For the space £2^^'^, Qd) we take the 
Gaussian weight with zero mean and variance 1/2, i.e., 

Qd{i)-:^^^v{-{ti + tl + --- + Q) for all t&W. 
Note that Kd{t, t) = 1 for all t e R*^, and therefore 

/ Kd{t,t)Qd{t)dt=l, 

jRd. 

SO that (4) holds. This means that the embedding Id is continuous, the operator Wd 
is compact and a finite trace operator with 

00 

(9) E^<^.^ = 1' 

by (8). 

Observe that (4) holds for all translation invariant kernels since 

/ Kd{t,t)gd{t)dt = KdiO), 

jRd, 

but it can now depend on d. For radially symmetric kernels we have 

/ Kd{t,t)gd{t)dt = ^i{Q), 

and it is independent of d. 
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Since a Gaussian kernel is of the product form, the space is the tensor 
product of the Hilbert spaces of univariate spaces with the kernels e"'''^^^"*^ for 
x,t G M. This also implies that the operator Wd is of the product form and its 
eigenpairs are products of the corresponding eigenpairs for the univariate cases. 

Consider now d — 1, and the space H{Ki) with Ki{x,t) ~ e^^ . Then the 

eigenpairs (^A^j, ^7j j of W\ are known, see [12]. Note that we have introduced the 

notation A^^- to emphasize the dependence of the eigenvalues on 7 in the following 
discussion (while the dependence on d has temporarily dropped from the notation). 
We have 

^ ^i(i + yrT4^) + ;^ (i(i + Vi + 47^) + 7^) ^ " 

where 

(10) uj^ = , ^ , 

^ i(i + vTT4^) + 72 

and 77^j = A^ j ^p^^j with 

<^ (^) = / (I + 47^)^/^ I 1^ ^^ \ ^ ((1 + 4^2)1/4 ^ 

^7jv ; y2^-i(j-i)! i(i + yrT4y)y ' ^' ^' 

where ii/j-i is the Hermite polynomial of degree j — 1, given by 

H.Ax) = i-iy-^e''^—-e-''' for all a; e M, 

so that 

/" i/|_i(x) e"^' dx = V7i^2J-i(j _ for j = 1, 2, . . . . 

Obviously, we have 

and applying (7) we obtain 

00 

Ki{x,t) = e"'^^^^"*^^ = ^ A-yj(^-yj(x)(^-yj(y) for all x, i e M. 

i=i 

Note that the eigenvalues A-yj- are ordered. The largest eigenvalue is 



A,,i = l-a;,= J— -==—— = 1-/ + 0(7^) as 7^0. 

y 1 + a/i + 472 + 272 

Furthermore, 

(11) - (1 - 7^ + ^(7^)) ( i_/+o(y) )' ' for i = 1, 2, . . . . 
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The space H{Ki) consists of analytic functions for which 

oo oo ^ 

j=l j=l 

This means that the coefficients of / in the space C2 decay exponentially fast. The 
inner product is obviously given as 

for ah f,geH{Ki). 

The reader may find more about the characterization of the space H{Ki) in [18]. 

For d > 1, let 7 = and j — {31,32, ■ ■ ■ ,jd) £ N*^- As already mentioned, the 

eigenpairs (j^d,-y,j,Vd,'y,jJ of Wd are given by the products 

d d . / 2 ^ 



d 

where uJj is defined above in (10), and 

F — 

e=i 

with 

{Vd,-f,i,Vd,'y,j) jj^ = {'^'r,i: ~ ^^J" 

This section ends with a lemma describing the convergence of the sums of powers 
of the eigenvalues for the multivariate problem, and how these sums depend on the 
dimension, d. This lemma is used in several of the theorems on convergence and 
tractability in the following sections. 

In the next sections, it will be convenient to reorder the sequence of eigenvalues 
{Xd,-y,j}jeN<i as the sequence {AdjjjgN with Xd.i > Xd,2 > ■ ■ ■ • Obviously, for the 
univariate case, 0? = 1, we have Aij = Ai^^^j for all j G N, but for the multivariate 
case, d > 1, the correspondence between X^j and Xd,-y,j is more complex. Obviously, 



e=i 

We now present a simple estimate of Ad,n+i that will be needed for our analysis. 
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Lemma 1. Let r > 0. Consider the Gaussian kernel with the sequence of shape 
parameters j — {'jiji^f^. The sum of the r*'* power of the eigenvalues for the d-variate 
case, d>l, is 




(13) Y.>^h = Y.K.,3 = T[\£K 

j=i jgNd e=i \j=i 

The (n + 1)** largest eigenvalue satisfies 

(14) ^^^n-..<^^^^U^i\ 

Proof. Equation (13) follows directly from the formula for Xd,-y,j in (12). From the 
definition of w-y in (10) it follows that < a;-y < 1 for all 7 > 0. For r e (0, 1), consider 
the function 

f{oo) = {1-ujy -l + oj^ for all a;e[0,l]. 

Clearly, / is concave and vanishes at and 1, and therefore /(a;) > for all u e (0, 1). 
This yields the lower bound on the sum of the power of the univariate eigenvalues. 
The ordering of the eigenvalues Xdj implies that 

/ -j^ n+l s l/r / ^ 00 s 1/r / oo x 1/t 

This yields the upper bound on the n + 1^* largest eigenvalue in (14), and completes 
the proof. □ 

The main point of (14) is that this estimate holds for all positive r. This means 
that \d,n+\ goes to zero faster than any polynomial in {n + 1)~^. 

4. Rates of Convergence for Gaussian Kernels 

In this section we consider the function approximation problem for the Hilbert 
space Hd — H{Kd) with the anisotropic Gaussian kernel given by (lb). We stress 
that the sequence 7 = {■ji}'^^ of shape parameters can be arbitrary. In particular, 
we may consider the isotropic case for which all 7^ = 7 > 0. 

We want to verify how fast the minimal errors e"°''~'^''(n, iJ^) and e^°^~^^'^{n, Hd) 
go to zero, and what the rate of convergence of these sequences is, see (6). 

Theorem 1. 



Proof. For the class A^^^ we know that e'"°''~'^^^(n, Hd) = ^yXd,n+l, where Xd,n+i is the 
(n+ 1)^* largest eigenvalue of Wd- Lemma 1 demonstrates that Xd,n+i is proportional 
to (n + l)"^/"^ times a dimension dependent constant. This implies that r'^°^~^^^{Hd) > 
1/(2t) and since r can be arbitrarily small, we conclude that 

as claimed. 
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Consider now the class A^**^. We use Theorem 5 from [10], which states that if there 
exist numbers p > 1 and B such that 

(15) Xd,n<Bn-P for all neN 

then for all S e (0, 1) and n E N there exists a linear algorithm A„ that uses at 
most n function values and its worst case error is bounded by 

e^°'^(A„) < BCs,p {n + i)-(i-'5)pV(2p+2)_ 

Here, Cs,p is independent of n and d and depends only on S and p. 

Note that assumption (15) holds in our case for an arbitrarily large p with B 
that can depend on d. Hence, r™°''~***'^(ifd) > (1 — 5)p'^/{2p + 2), and since 5 can be 
arbitrarily small and p can be arbitrarily large we conclude 

as claimed. This completes the proof. □ 

We stress that the algorithm An that was used in the proof is non-constructive. 
However, there are known algorithms that use only function values and whose worst 
case error goes to zero like n"'^ for an arbitrary large p. In fact, given a design, it 
is known that the spline algorithm is the best way to use the function data given 
via that design. Thus, the search for an algorithm with optimal convergence rates 
focuses on the choice of a good design. One such design was proposed by Smolyak 
already in 1963, see [15], and today it is usually referred to as a sparse grid, see 
[3] for a survey. An associated algorithm from which this design naturally arises is 
Smolyak's algorithm. The essence of this algorithm is to use a certain tensor product 
of univariate algorithms. Then, if the univariate algorithm has the worst case error 
of order n~^, the worst case error for the d-variate case is also of order n^^ modulo 
some powers of In n, see e.g., [21]. 

Theorem 1 states that as long as only the rate of convergence is considered, the 
function approximation problem for Hilbert spaces with Gaussian kernels is easy. In 
fact, it is not surprising since functions of this class are very smooth. However, the 
rate of convergence tells us nothing about the dependence on d. As long as d is small 
the dependence on d is irrelevant. But if d is large we want to check the dependence 
on d. We are especially afraid of an exponential dependence on d which is called after 
Bellman the curse of dimensionality. It also may happen that we have a tradeoff 
between the rate of convergence and dependence on d. Furthermore, the results may 
now depend on the weights 7^. This is the subject of our next sections. 

5. Tractability for the Absolute Error Criterion 

As in the previous section we consider the function approximation problem for 
Hilbert spaces = H{Kd) with a Gaussian kernel. We now consider the absolute 
error criterion and we want to verify whether polynomial tractability holds. Let us 
recall that we study the minimal number of functionals from the class A^" or A^**^ 
needed to guarantee a worst case error of at most e, 

^wor-abs-^(^^ i/d) = min { n I e'^°'-'^{n, Ha) < e] , 1? e {std, all}. 
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5.1. Arbitrary Linear Functionals. 

We first analyze the class A*^^^ and polynomial tractability. 

Theorem 2. Consider the function approximation problem Z = {IdjdeN for Hilbert 
spaces with isotropic or anisotropic Gaussian kernels with arbitrary positive 7^ for 
the class A^" and the absolute error criterion. Then 

• I is strongly polynomially tractable with exponent of strong polynomial tractabil- 
ity at most 2. For all d & N and e e (0, 1) we have 

-^n,i7,) < (n+l)-^/2^ 

• For the isotropic Gaussian kernel the exponent of strong tractability is 2, so 
that the bound above is best possible in terms of the exponent of . Further- 
more strong polynomial tractability is equivalent to polynomial tractability. 

Proof. We use Theorem 5.1 from [11]. This theorem says that X is strongly polyno- 
mially tractable iff there exist two positive numbers Ci and r such that 



l/r 

OO. 



C2 := sup V < 



If so, then 

^wor-abs-aii^^^ < (Ci + CJ) e'^^ for all d G N and e e (0, 1). 

Furthermore, the exponent of strong polynomial tractability is 

p^" = inf{2T I T for which C2 < 00}. 

Let r = 1. Then, by (9) it follows that no matter what the weights 7^ are, we can take 
an arbitrarily small Ci so that \Ci \ = 1 and C2 = 1 as well as n^'^'^~^^^~'^^\s, Ha) < 
{Ci + 1) e~^. For Ci tending to zero, we conclude the bound 

^wor-abs-all(^^ < e'^ 

Furthermore, by (14) in Lemma 1 it follows that 

gWor-all(^^ i/,) = < (n + 1)-V2, 



as claimed. 
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Assume now the isotropic case, i.e., 7^ = 7 for all j e N. Then for any positive Ci 
and T we use Lemma 1 and obtain 

00 00 \Ci ] — 1 

j=\Ci] i=i j=i 

rcii-i 



(1- 


- ^7)^ 


1 








(1 


-^7)" 


1 








(1- 




1 





-(rCil-l)(l-a,,)' 



For r e (0, 1), we know from Lemma 1 that (1 — ^7)^/ (1 — cup > 1, and therefore 
the last expression goes exponentially fast to infinity with d. This proves that C2 — 00 
for all r G (0, 1). Hence, the exponent of strong tractability is two. 

Finally, to prove that strong polynomial tractability is equivalent to polynomial 
tractability, it is enough to show that polynomial tractability implies strong poly- 
nomial tractability. From Theorem 5.1 of [11] wc know that polynomial tractability 
holds iff there exist numbers Ci > 0, gi > 0, 52 > and r > such that 

l/r" 

C, := sup { rf-^^ I X ' 



< 00. 



If SO, then 



, wor— abs— all 



n 



[e, Hd) < (Ci + CD £-2^ 



for all e e (0, 1) and c? G N. Note that for all d we have 

' - d-^- ( [Cii - 1) (1 - u,y' < ci < 00. 

This implies that r > 1. On the other hand, for r = 1 wc can take gi = ^2 = and 
arbitrarily small Ci, and obtain strong tractability. This completes the proof. □ 

We now compare Theorems 1 and 2. Theorem 1 says that for any p we have 

gWor-all(^^^ _ C(n-P) 

but the factor in the big O notation may depend on d. In fact, from Theorem 2 we 
conclude that, indeed, for the isotropic case it depends more than polynomially on d 
for all p > 1/2. Hence, the good rate of convergence does not necessarily mean much 
for large d. 

The exponent of strong polynomial tractability is 2 for the isotropic case. We now 
check how the exponent of strong polynomial tractability depends on the sequence 
7 = \S1i\i&k of shape parameters. The determining factor is the quantity r(7) intro- 
duced in (2), which measures the rate of decay of the shape parameter sequence. 
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Theorem 3. Consider the function approximation problem X — {louden for Hilbert 
spaces with isotropic or anisotropic Gaussian kernels for the class A*^^^ and the absolute 
error criterion. Let r{j) be the rate of decay of shape parameters. Then 

• I is strongly polynomially tractable with exponent 

p^" = min ( 2,-^ 1 < 2. 

• For all d eN, e e (0, 1) and 5 G (0, 1) we have 

^wor-abs-all(^^ = O , 

where the factors in the big O notation do not depend on d and but may 
depend on 5. 

• Furthermore, in the case of ordered shape parameters, ie., 71 > 72 > • • • if 

^wor-abs-aii^^^ 7/^) = O {e'^ d«) for all 8 G (0, 1) and d e N, 

then p > p^"; which means that strong polynomial tractability is equivalent to 
polynomial tractability. 

Proof. As in the proof of Theorem 2, X is strongly polynomially tractable iff there 
exist two positive numbers Ci and r such that 

l/r 

C2 sup I 

Furthermore, the exponent p'^^^ of strong polynomial tractability is the infimum of 2t 
for which this condition holds. Proceeding similarly as before, we have 

j=lCi] j=l £=1 T< 

and since Xdj < 1 

MCi] i=i e=i 

Therefore, X is strongly polynomially tractable iff there exists a positive r such that 

and the exponent p^^^ is the infimum of 2r for which the last condition holds. 

As we already know, this holds for t — 1. Take now r e (0, 1). Since (1 — ijJ^^)/{1 — 
^7^^^^ > 1 then C3 < 00 imphes that 

lim ^-^^^ = 1 
^-^00 (1 -a;;JVr 



< 00. 
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Taking into account (10), it is easy to check that the last condition is equivalent to 

lim ui^^ = lim 7^ = 0. 



Furthermore, C3 < 00 implies that 

5^7f <oo. 



e=i 



and r(7) > l/(2r) > 1/2. Hence, p^" < 2 only if r(7) > 1/2. On the other hand, 
2t > l/r(7) and therefore p^^^ > l/r{'j). This establishes the formula for p^^^. The 
estimates on e^°''~^^^(n. Ha) and T^wor-abs-aii^g,^ follow from the definition of strong 
tractability. 

Assume now polynomial tractability with p < 2 and an arbitrary q. Then A^^^+i < 
^2 fQj. = 0{e~^d'^). Hence, 

A,,„+i = 0(ci2^/^(n + l)-2/^'). 

This implies 

for all 2r>p. 

j=l T« ^=1 

For T < 1, this yields 



Therefore 



lim sup ^ < 00. 

£_^oo In " 



Since the 7^'s are ordered, we have 

In d - In d ^ 

and 7d = 0{{\n{d) / df/^"^^^). Hence, r(7) > l/(2r) and r{-f) > 1/p. This means that 
2 > p > l/r(7) = p^", as claimed. □ 

It is interesting to notice that the last part of Theorem 3 does not hold, in general, 
for unordered shape parameters. Indeed, for s > 1/2, take 

-fa^ = 1 for all natural k with ak = 2 , 
7^ = — for all natural £ not equal to Oj^. 

Then strong polynomial tractability holds with the exponent 2 since C3 = 00 in the 
proof of Theorem 3 for all r < 1. On the other hand, we have polynomial tractability 
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with p — 1/s <2 and q arbitrarily close to l/(2s). Indeed, for r = l/(2s) and qi—Q 
and g2 > 1 we have 

e=i i ^ 

T nt-\ 0(l)+ln In d 



^-92 j - ) o{d) < OO. 

1 — UJi 



This implies that 

,wor— abs— all. 



n 



Theorem 3 states that the exponent of strong polynomial tractability is 2 for all 
shape parameters for which r(7) < 1/2. Only if r(7) > 1/2 is the exponent smaller 
than 2. Again, although the rate of convergence of e™°''~^"(n, H^) is always excellent, 
the dependence on d is eliminated only at the expense of the exponent which must 
be roughly Of course, if we take an exponentially decaying sequence of shape 

parameters, say, '~fe = for some q G (0, 1), then r{'y) = oo and p^" = 0. In this case, 
we have an excellent rate of convergence without any dependence on d. 

Although Theorem 2 is for Gaussian kernels, it is easy to extend this theorem for 
other positive definite translation invariant or radially symmetric kernels. Indeed, for 
translation invariant kernels the only difference is that for r — 1 the sum of the 
eigenvalues is not necessarily one but 



oo 



Hence, for all e e (0, 1) and d E N we have 

1/2 



_wor— all/„ U \ ^ 



n 



and n^^'-^'^^-^^n.Ha) <Kd{Q)e 



-2 



Tractability then depends on how i^d(O) depends on d. In particular, it is easy to 
check the following facts. 

• If 

su-pKdi^) < oo 

then we have strong polynomial tractability with exponent at most 2, i.e.. 



n 



wor— all 



(n. Ha) = O {e-^) . 
• If there exists a nonnegative q such that 

sup^d(O) d~'^ < oo 

dm 

then we have polynomial tractability and 
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• If 

lnmax(X,(0),l) ^ ^ 

then we have weak tractabihty. 
For radially symmetric kernels, the situation is even simpler since 

oo 

and it does not depend on d. Hence, 

,wor-all/„ o- \ ^ '^(0) ^ 



^ ^ ^ and n^'"-^^-^^\n, Ha) < k{0) £-^ 

and wc have strong polynomial tractability with exponent at most 2. 

Extending Theorem 3 to arbitrary stationary or isotropic kernels is not so straight- 
forward. To achieve smaller strong tractability exponents than 2 one needs to know 
the sum of the powers of eigenvalues, and their dependence on d. One would suspect, 
as is the case for Gaussian kernels, that some sort of anisotropy is needed to obtain 
better strong tractability exponents than 2. 

5.2. Only Function Values. 

We now turn to the class A***'^ and prove the following theorem. 

Theorem 4. Consider the function approximation problem I = {Id}d£N for Hilhert 
spaces with isotropic or anisotropic Gaussian kernels for the class A^*^ and the abso- 
lute error criterion. Then 

• I is strongly polynomially tractable with exponent of strong polynomial tractabil- 
ity at most 4. For all d & N and e e (0, 1) we have 

V2 i, 1 



^wor— abs— std/. Tj \ ^ 



1 + 



£4 



• For the isotropic Gaussian kernel the exponent of strong tractability is at 
least 2. Furthermore strong polynomial tractability is equivalent to polynomial 
tractability. 

Proof. We now use Theorem 1 from [22]. This theorem says that 
(16) e^°'-^^'^{n,Hd) < min i [e'^'''-''^\k, Hd)f + - ) . 



fe=o,i,... y n J 

Taking k — \n~^^^] and remembering that e^°''~^^(/c, Hd) < we obtain 

as claimed. Solving e^""" ^*^(n, Hd) < e, we obtain the bound on ^*'^(£, Hd). 
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For the isotropic case, we know from Theorem 2 that the exponent of strong 
tractabihty for the class A*^^^ is 2. For the class A®**^, the exponent cannot be smaller. 

Finally, assume that we have polynomial tractabihty for the class A'^**^. Then we also 
have polynomial tractabihty for the class A^". From Theorem 2 we know that then 
strong tractabihty for the class A^^^ holds. Furthermore we know that the exponent 
of strong tractabihty is 2 and 77,wor-abs-aii^g,^ ^^■^ ^ ^-2_ j^^ above, we then get strong 

tractabihty also for A^**^ with the exponent at most 4. This completes the proof. □ 

We do not know if the error bound of order n"-*"/^ is sharp for the class A***'^. We 
suspect that it is not sharp and that maybe even an error bound of order n~^^^ holds 
for the class A'^*'^ exactly as for the class A^^^ 

For fast decaying shape parameters it is possible to improve Theorem 4. This is 
the subject of our next theorem. 

Theorem 5. Consider the function approximation problem I — {IdjdeN for Hilbert 
spaces with isotropic or anisotropic Gaussian kernels for the class A®*^ and the abso- 
lute error criterion. Letri^) > 1/2. Then 

• X is strongly polynomially tractable with exponent at most 



For all d eN, e e (0, 1) and 5 e (0, 1) we have 



e 



where the factors in the big O notation do not depend on d and e ^ but may 
depend on S. 

Proof. For r{'y) > 1/2, Theorem 3 for the class A*^^^ states that the exponent of strong 
polynomial tractabihty is p*^^ = l/r{'y). This means that for all r] e (0, 1) we have 

with the factor in the big O notation independent of n and d but dependent on 5. 
Since 2r(7) > 1, it follows that for all positive 77 small enough, p = 2r{'y) — rj > 1. 
Applying Theorem 5 from [10] as in the proof of Theorem 1, it follows that for any 
Si e (0, 1) we have 

again with the factor in the big O notation independent of n and d but dependent 
on 6. This leads to the estimates of the theorem. □ 



Note that for large r(7), the exponents of strong polynomial tractabihty are nearly 
the same for both classes A^^ and A^*^. For an exponentially decaying sequence of 
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shape parameters, say, — for some q e (0, 1), we have p^" = p^**^ = 0, and the 
rates of convergence are excellent and independent of d. 



6. Tractability for the Normalized Error Criterion 

We now consider the function approximation problem for Hilbert spaces Hd{Kd) 
with a Gaussian kernel for the normalized error criterion. That is, we want to find 
the smallest n for which 

e^°'-^{n, Hd) < e \\Id\\, ^ G {std, all}. 

Note that = y^XdJ < 1 and it can be exponentially small in d. Therefore the 
normalized error criterion may be much harder than the absolute error criterion and 
this is the reason for a number of negative results for this error criterion. It turns out 
that the isotropic and anisotropic cases are quite different and we will study them 
in separate subsections. We begin with the case where the data are generated by 
arbitrary linear functionals. The class A^**^ is partially covered at the end. 

6.1. Isotropic Case with Arbitrary Lineeir Functionals. 

For the isotropic case, 7^ = 7 > 0, we have 

and since A^^i = 1 — o;^ < 1, the norm of Id is exponentially small. We are ready to 
present the following theorem. 

Theorem 6. Consider the function approximation problem I = {Id}den for Hilbert 
spaces with isotropic Gaussian kernels for the class A^^ and for the normalized error 
criterion. Then 

• X is not polynomially tractable, 

• I is quasi-polynomially tractable with exponent 



272 



That is, for all d G N, e G (0, 1) and 6 G (0, 1) we have 
e-^'-^'\n,Hd) = oi\\Id\\(- 



wor-all/ 1\ (*^"+*)(l+lnd) / 1 



n 



rf/4^ 



1(1 + 71+4^)+ 7' 



^wor-nor-all^^^ i/,) = O (cxp ((t^^^ + S){1 + lu d){l + lu e"!))) , 

where the factors in the big O notations are independent of n, and d but 
may depend on S. 

Proof. The lack of polynomial tractability follows, in particular, from Theorem 5.6 
of [11]. In fact, the lack of polynomial tractabihty for the class A^" holds for all tensor 
product problems with two positive eigenvalues for the univariate case. 
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For quasi-polynomial tractability we use Theorem 3.3 of [7], which states that 
quasi-polynomial tractability for the class A^" holds for tensor product problems iff 
the rate 

r — sup s /3 > I lim „ = > 

of the univariate eigenvalues is positive and the second largest univariate eigenvalue 
A^,2 is strictly less than the largest univariate eigenvalue A-y,i. If so, then the exponent 
of quasi-polynomial tractability is 



i^ii = max ( -, 



r In A^,i/A^,2 



In our case, r — oo and 



In A^,i/\,2 - In . 1+27^+^/1^ ■ 



272 

The estimates of e"™'"''^^(n, H^) and ^"or-nor-aii^^^^ follow from the definition of 
quasi-polynomial tractability. This completes the proof. □ 

For the isotropic case we lose polynomial tractability for the normalized error 
criterion although even strong polynomial tractability is present for the absolute 
error criterion. This shows qualitatively that the normalized error criterion is much 
harder. In this case we only have quasi-polynomial tractability. Observe that the 
exponent of quasi-polynomial tractability depends on 7 and we have 

hm ^^^^(7) = and hm ^^"(7) = 00. 

7— >-0 7— >oo 

For some specific values of 7 we have 

^aii^2-i/2) = 1.5186..., 

t''ii(l) = 2.0780..., 

^aii^2V2) = 2.8853.... 

6.2. Anisotropic Case with Arbitrciry Linear Functionals. 

We now consider the sequence {7^}^eN of shape parameters and ask when we can 
guarantee strong polynomial tractability. As we shall see, this holds for the class A**" 
if r(7) > although the exponent of strong polynomial tractability is large for small 
r(7). More precisely, we have the following theorem, which is similar to Theorem 3. 

Theorem 7. Consider the function approximation problem I = {Id}deN for Hilhert 
spaces with anisotropic Gaussian kernels for the class A^" and for the normalized 
error criterion. Then 

• X is strongly polynomially tractable ifr{'y) > 0. If so, then the exponent is 

..all 1 
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Let r(7) > 0. Then for all deN, e e (0, 1) and 5 e (0, 1) we have 



where the factors in the big O notations are independent of n, and d but 
may depend on 5. 

• Furthermore, in the case of ordered shape parameters, ie., 71 > 72 > • • • if 



n 



wor— nor— all / 



£, Hd) = O {e-P d'i) for all e e (0, 1) and d e N, 



then p > p^^^ = which means that strong polynomial tractability is equiv- 
alent to polynomial tractability. 

Proof. Theorem 5.2 of [11] states that strong polynomial tractability holds iff there 
exits a positive number r such that 



C^r^supVf^) =nr^ 



< oo. 



If so, then n'^°'-''°'-^^\e, Ha) < C2 for all e e (0, 1) and deN, and the exponent 
of strong polynomial tractability is the infimum of 2t for which 6*2 < 00. 
Clearly, (72 < 00 iff 



J2^j,<^ iff 5^7f<oo. 



i=i 1=1 

This holds iff r(7) > 1/(2t) > 0. This also proves that p^^^ = l/r(7). The estimates on 
gwor-aii^^^ i/d) and n^°'^-"°"-^^^(£. Ha) follow from the definition of strong tractability. 

The case of polynomial tractability for ordered shape parameters follows analo- 
gously to the proof in Theorem 3. From Theorem 5.2 of [11], we know that the 
problem is polynomially tractable with = O {e'^'' d'^^'') iff 



C2 := sup d"^^ 
dm 



cx3^,\t-|1/t d ^ 



Proceeding as in the proof of Theorem 3, this can happen for ordered shape param- 
eters only if T > l/(2r(7)). Therefore, p > p^^^ — l/r(7), as claimed. □ 

The essence of Theorem 7 is that under the normalized error criterion strong poly- 
nomial and polynomial tractability for the class A^^^ requires that the shape parame- 
ters tend to zero polynomially fast so that r(7) > 0. This condition is stronger than 
what is required for the absolute error criterion. 

It is interesting to compare strong polynomial tractability for the absolute and 
normalized error criteria for the class A^^\ see Theorems 3 and 7. This is the subject 
of the next corollary. 
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Corollary 1. Consider the function approximation problem I — {Id\de^ fof Hilbert 
spaces with isotropic or anisotropic Gaussian kernels for the class A^". Let r{'y) be 
the rate of convergence of shape parameters. 

• Absolute error criterion: 

I is always strongly polynomially tractable with exponent 

p^" = min (2,^-1 < 2. 

• Normalized error criterion: 

X is strongly polynomially tractable iff r{^) > 0. If so, the exponent is 



The strong tractability exponents under the two error criteria are the same provided 
that r(7) > 1/2. 

6.3. Only Function Values. 

We now turn to the class A^**^. We do not know if quasi-polynomial tractability 
holds for the class A^*'* in the isotropic case. The theorems that we used for the 
absolute error criterion are not enough for the normalized error criterion. Indeed, no 
matter how a positive k is defined in (16) we must take n exponentially large in d if 
we want to guarantee that the error is less than Similarly, if we use (15) then 

we must guarantee that p > 1 and this makes the number B exponentially large in 
d. We leave as an open problem whether quasi-polynomial tractability holds for the 
class A^*'^. 

We now discuss the initial error for lim^^oo — We have 

\M = n (1 - ^.ef" = exp [0{1) - \ J2 7I ) • 

e=i \ e=i / 

For r(7) e [0, 1/2), the initial error still goes exponentially fast to zero, whereas for 
r(7) — 1/2 it may go to zero or be uniformly bounded from below by a positive 
number, and finally for r{'y) > 1/2 it is always uniformly bounded from below by a 
positive number. For example, take 7^ = f~"ln'^(l + i) for a positive a and real /3. 
Then r(7) = a. For a = |, the initial error goes to zero for /3 > — i, and is of order 1 
if/3<-|. 

This discussion shows that for r{'y) > 1/2 there is really no difference between 
the absolute and normalized error criteria. This means that for r(7) > 1/2 we can 
apply Theorem 5 for the class A^*^ with e replaced by s\\Id\\ = ©(£)• For r{'y) = 1/2, 
Theorem 4 can be applied if we assume additionally that YliLi'^i < C)0- The last 
assumption implies that \\Id\\ = ©(I)- We summarize this discussion in the following 
corollary. 

CoroUciry 2. Consider the function approximation problem X — {Id}dm for Hilbert 
spaces with anisotropic Gaussian kernels for the class A^**^ and for the normalized 
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error criterion. Assume that 

r(7) > \ or I r(7) = ^ and < oo 

Then 



1=1 



• X is strongly polynomially tractable with exponent at most 

• For all d eN, e e (0, 1) and S e (0, 1) we have 



gWor— all/ 



wor— nor— all/" 



where the factors in the big O notations are independent of n,e ^ and d but 
may depend on S. 

The case r{j) < 1/2 is open. We do not know if polynomial tractability holds for 
the class A^*^ in this case. 
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